DOCUMENT RESUME 



ED 444 829 



SE 063 843 



AUTHOR 

TITLE 

INSTITUTION 
SPONS AGENCY 

PUB DATE 
NOTE 

CONTRACT 
AVAILABLE FROM 



PUB TYPE 
JOURNAL CIT 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Meyer, Robert H. 

Value-Added Indicators: A Powerful Tool for Evaluating 
Science and Mathematics Programs and Policies. 

National Inst, for Science Education, Madison, WI . 
Wisconsin Univ. , Madison. ; National Science Foundation, 
Arlington, VA. 

2000-06-00 

9p. 

RED-9452971 

National Institute for Science Education, University of 
Wisconsin-Madison, 1025 W. Johnson Street, Madison, WI 
53706. For full text: http://www.nise.org. 

Collected Works - Serials (022) 

NISE Brief; v3 n3 Jun 2000 
MF01/PC01 Plus Postage. 

Elementary Secondary Education; Mathematics Education; 
♦Outcomes of Education; *Program Evaluation; Science 
Education; *Scoring Formulas; *Statistical Analysis 
♦Value Added 



ABSTRACT 



This issue of NISE Brief discusses the weakness of the most 
commonly used educational outcome indicators- -average and median test scores 
and prof iciency- level indicators- -and the advantages of value-added 
indicators. It offers a critique of the average test score as a measure of 
school and program performance as an example based on national data. 
Value-added indicators as data requirements are also discussed. (Contains 13 
references.) (ASK) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 







REPORTING ON ISSUES AND RESEARCH IN SCIENCE ; AMr//fAM77CS, ENGINEERING, AND TECHNOLOGY EDUCATION 



Value-Added Indicators: 

A Powerful Tool for Evaluating Science 
and Mathematics Programs and Policies 

by Robert H. Meyer 



U S DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

OsUJlis document has been reproduced as 
y received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



States and districts 
are increasingly 
turning to school 
accountability as 
an instrument 
of reform. 




0 

.M 



E ducational outcome indicators frequently 
are used to measure the performance of 
schools, programs, and policies. Reliance 
on such indicators is largely the result of a grow- 
ing demand to hold these entities accountable for 
their performance, defined in terms of outcomes, 
such as standardized test scores in mathematics, 
science and reading, rather than inputs, such as 
teacher qualifications, class size, or the quality of / 
lab facilities. This Brief discusses the weaknesses' 

2 



of the most commonly used educational outcome 
indicators — average and median test scores and 
proficiency-level indicators — and the advantages 
of value-added indicators . 1 Several major conclu- 
sions emerge from the analysis. 

First, the most common educational indi- 
cators are highly flawed as measures of school 
and program performance, even if they are 
derived from highly valid assessments. As a 
result, they are of limited value, if not useless, 

BEST COPY AVAILABLE 



r National institute 
for Science Education 



University of Wisconsin— Madison • National Center for Improving Science Education 
Funded by the National Science Foundation • Web address: www.nise.org 



M3, No. 3- June 2000 






1973 to 1982 and then partial recovery 
between 1982 and 1986. The eleventh- 
grade data, by themselves, are fully consis- 
tent with the premise that academic 
reforms in the early and mid 1 980s gener- 
ated substantial gains in academic 
achievement. In fact, an analysis of the 
data based on a gain indicator (a value- 
added type indicator) rather than an 
average test score suggests the opposite 
conclusion — see Panel B of Table 1 . 

The gain indicator is similar to a true 
value-added indicator in that it controls 
for differences among students in prior 
achievement. It does so in a very simple 
and intuitive way: gain is the change in 
average test scores over time (and across 
grades) for the same cohort of students. 
For example, the gain in test scores for 
students who were eleventh-grade stu- 
dents in 1986 is given by average test 
score of eleventh-grade students in 1986 
minus the average test score for seventh- 
grade students in 1982 (four grades and 
four years earlier) (that is, 302.0 - 268.6 
= 33.4). Unfortunately, the gain indica- 
tor, unlike the value-added indicator, 
does not control for differences in 
student, family, and neighborhood char- 
acteristics that contribute to growth in 
student achievement. As a result, the gain 
indicator reflects possible changes over 
time in the composition of the popula- 
tion as well as changes in school produc- 
tivity. 4 Nonetheless, it is instructive to 



compare the gains in achievement experi- 
enced by different cohorts. 5 

As indicated in Panel B, the achieve- 
ment growth of high school students 
(from seventh to eleventh grade) during 
the 1982 and 1986 period was actually 
no better than achievement growth 
during previous periods. In fact, the gain 
from seventh to eleventh grade was actu- 
ally slightly lower during the 1982 to 
1986 period than in previous periods! 
The rise in eleventh-grade math scores 
from 1982 to 1986 stems from an earlier 
increase in achievement growth for that 
cohort rather than from an increase in 
achievement growth over grades seven to 
eleven. In short, these data provide no 
support for the notion that high school 
academic reforms generated significant 
increases in test scores during the mid- 
1980s. These data also vividly confirm 
the general superiority of the gain indica- 
tor, relative to level indicators such as the 
average test score, as a measure of educa- 
tional productivity. 

It would be interesting to report the 
above analysis using true value-added as 
opposed to gain indicators. Unfortu- 
nately, the NAEP data do not permit 
such an analysis to be conducted, since 
the same students are not sampled for 
two consecutive NAEP surveys. This 
weakness in NAEP data could be reme- 
died by switching to a survey design that 
was at least partially longitudinal. 



Value-Added Indicators: 

Data Requirements 

Given the problems that exist with the 
average test score and other level indica- 
tors and, to a lesser degree, the gain indi- 
cator, it is important to consider whether 
value-added indicators could potentially 
be used as the primary tool for evaluating 
the performance of schools and pro- 
grams. There are at least two reasons to 
be optimistic in this regard. First, value- 
added models have been used extensively 
over the last three decades by evaluators 
and other researchers interested in educa- 
tion and training programs. Second, a 
number of districts and states, including 
Dallas, Minneapolis, South Carolina, 
and Tennessee, have successfully imple- 
mented value-added indicator systems. 6 

Nonetheless, despite the promise of 
value-added indicator systems, it is clear 
that they require a major commitment. 
In particular, districts and states must be 
prepared to (1) assess students frequendy 
and (2) develop comprehensive district 
or state data systems that contain infor- 
mation on student test scores and 
student, family, and community charac- 
teristics. The need for frequent testing 
stems from the fact that value-added 
indicators are designed to measure the 
contribution of schools to growth in 
student achievement over a given time 
period. In order to be able to construct 



Table 1. NAEP Mathematics Examination Data 

(A) Average Test Scores by Year (B) Average Test Score Gain From Year to Year for Each Cohort 



GRADE 


1978 


1978 


1982 


1986 


3rd 


219.1 


218.6 


219.0 


221.7 


7th 


266.0 


264.1 


268.6 


269.0 


11th 


304.4 


300.4 


298.5 


302.0 



GRADE 


1973 to 1978 


1978 to 1982 


1982 to 1986 


3rd to 7th 


45.0 


50.0 


50.0 


7th to 11th 


34.4 


34.4 


33.4 



Source: Dosseyetal. (1988). 




4 






5 




Student and family characteristics also contribute to student achievement. 



value-added (or gain) indicators 
it is therefore necessary to have 
achievement data for the same 
individuals at two points in time. 
Students who are missing either 
pre- or posttest data must be 
excluded from the analysis and 
thus from a districts accountabil- 
ity and/or evaluation system. 

From the perspective of mea- 
suring school and program 
performance, an ideal testing 
program would do the following: 

• Test all students annually 
during the late spring. Many 
districts currendy follow this 
practice. 

• Test all students who attend 
summer school at the end of 
the summer (or in the fall at 
the beginning of the subse- 
quent school year). Follow- 
ing the recent boom in 
summer school enrollments, 
many districts have begun 
tesdng students at the end of 
summer school. 

• Test mobile students at the 
point of entry into the dis- 
trict (or into a new school in the 
district). 7 Minneapolis is one of the 
districts that is pioneering the use of 
entry-point testing. As indicated 
below, this component is very 
important in a comprehensive 
assessment program. 

Annual testing has three major 
advantages. First, it maximizes account- 
ability by localizing school and program 
performance to the most natural unit of 
accountability: the grade level or class- 
room. Second, it yields up-to-date infor- 
mation on performance. Third, it 
severely limits the number of students 
who would be excluded due to student 
mobility and, as a result, yields a data set 
that is likely to be highly representative of 



O 




the school population as a whole and 
large enough to yield statistically reliable 
school performance estimates. On the 
other hand, less frequent testing, say 
testing at grades kindergarten, 4, 8, and 
1 2, might be acceptable for national pur- 
poses, since student mobility is not really 
an issue at the national level. For pur- 
poses of evaluating local school and 
program performance, however, the 
problems created by student mobility 
argue strongly for frequent testing. 

Adding a post-summer-school test 
yields one additional advantage; namely, 
it allows districts to separately evaluate 
the productivity of programs during the 
regular school year and those during the 
summer. 8 Adding a point-of-entry test 
for in-migrant students enables districts 



6 * 



to evaluate the degree to which 
mobile students experience 
growth in achievement that is 
comparable to that of nonmobile 
students. Furthermore, it allows 
these students to be included in 
state and district performance 
indicators. 9 When schools are 
increasingly under pressure 
to achieve high (measured) 
performance, adopting an indi- 
cator/evaluation system that 
systematically excludes any group 
in the population seems particu- 
larly unwise. 

One potential obstacle to pro- 
ducing high-quality value-added 
indicators is the difficulty of 
collecting extensive information 
on student and family characteris- 
tics. These data are required as 
“control variables” in value-added 
models. In most schools the fol- 
lowing data are typically available 
from administrative records: race 
and ethnicity, gender, special edu- 
cation status, limited English pro- 
ficiency (LEP) status, eligibility 
for free or reduced-price lunch, 
and whether a family receives welfare 
benefits. Supplemental surveys of stu- 
dents and parents may be used to collect 
other information, such as parental 
education and income and family atti- 
tudes toward education (variables known 
to be powerful determinants of student 
achievement growth). 

The consequence of failing to control 
adequately for student, family, and com- 
munity characteristics is that value-added 
indicators may be contaminated if there 
are major differences across schools and 
programs in unmeasured (uncontrolled) 
student, family, and community charac- 
teristics. Thus, value-added indicators 
derived from models with “weak” predic- 
tors of student achievement growth 
might be only slightly better than gain 



5 




A value-added approach to school accountability is useful and possible. 



indicators (better in the sense of being 
more highly correlated with a theoreti- 
cally perfect value-added indicator). Even 
so, they are likely to be much better indi- 
cators than average test scores. The key 
issue, of course, is not whether a particu- 
lar value-added indicator is perfect. 
Rather, the issue is whether the indicator 
provides a substantially better measure of 
school and program performance than 
other affordable indicators. 

The cost of implementing an assess- 
ment system that is sufficient to support 
value-added (or gain) indicators is obvi- 
ously higher than an assessment system 
that tests students only in selected grades 
(say, 4, 8, and 12). The thrust of this 
Brief is that an assessment system with 
infrequent testing is unlikely to produce 
outcome indicators that are valid for the 
purpose of measuring school perfor- 
mance. Thus, a district that is unwilling 
or unable to support the expense of 
frequent assessment should be very wary 
of using the achievement data that it does 
collect to evaluate the performance of 
schools and programs. 

Conclusions and 
Recommendations 

Average and median test scores and profi- 
ciency-level indicators, the most com- 
monly used indicators in American 
education, are highly suspect as indicators 
of school and program performance. These 
indicators suffer from four major deficien- 
cies: they fail to localize performance to the 
classroom or grade level; they aggregate 
information on performance that tends to 
be grossly out of date; they are contami- 
nated by student mobility; and they fail to 
measure the distinct contribution of 
schools and programs to growth in student 
achievement as separate from the contribu- 
tion due to student, family, and commu- 
nity factors. As a result, they are flawed 



0 




measures for evaluation purposes and are 
weak, if not counterproductive, instru- 
ments of public accountability. 

The gain indicator (the change in 
average test scores from grade to grade for 
the same cohort of students) and the 
value-added indicator (the gain indicator 
statistically adjusted for differences across 
schools and programs in the type of stu- 
dents served) avoid the first of these four 
problems. In addition, the value-added 
indicator potentially eliminates the bias 
that exists in the gain indicator due to 
differences across schools in student, 
family, and community characteristics, 
particularly if it is based on a model that 
includes an extensive set of control vari- 
ables. In this case, it fully eliminates the 
incentive for schools to cream. 

The value-added approach to mea- 
suring school and program performance 
relies on a statistical model to identify the 
distinct contributions made by schools 
and programs to growth in student 
achievement. The quality of a value- 
added indicator is determined by four 
factors: the frequency with which stu- 



dents are tested, the quality and appro- 
priateness of the tests that underlie the 
indicators, the adequacy of the control 
variables included in the value-added 
models, and the appropriateness (valid- 
ity) of the statistical model used to used 
to define the indicator. In terms of the 
first factor, states and districts need to 
seriously consider testing students at 
every grade level, beginning with kinder- 
garten; to further improve their indicator 
systems, states and districts need to think 
about testing summer school students 
and in-migrant students at the point of 
entry into the school or district. With 
respect to the second and third issues, it 
is important that states and districts 
make it a major priority to collect exten- 
sive and reliable information on student 
and family characteristics and to develop 
state tests that are technically sound and 
fully attuned to their educational goals. 
Finally, ongoing research is needed to 
assess the sensitivity of estimates of school 
and program performance to alternative 
statistical models and alternative sets of 
control variables. 



6 ^ 



ENDNOTES 



REFERENCES 



1 Proficiency-level indicators measure the proportion of students 
who score above a specified proficiency-level “cut point.” 

2 Note that value-added indicators focus on the growth in 
student achievement from one grade to -the next for given cohorts 
of students rather than on the change (or trend) over time in 
average test scores for students at a given grade level. Value-added 
indicators are thus based on longitudinal as opposed to cross- 
sectional student data. 

3 See Barton and Coley (1998) for a similar analysis that focuses 
on gains in student achievement for students age 9 to 13 from 1978 
to 1996. 

4 The gain indicator also cannot be constructed if the before 
(pre) and after (post) tests differ and have not been placed on the 
same measuring scale. 

5 NAEP was originally designed to permit this type of analysis. 
In mathematics, the tests have generally been given every four years 
at grade levels spaced four years apart. For this illustrative analysis, 
we assume that average test scores in 1973 are comparable to the 
unknown 1974 scores. 

6 Millman (1997) contains detailed descriptions and analyses of 
the Dallas and Tennessee value-added systems. 

2 In principle, mobile students could also be tested prior to 
migrating out of a school or district. On the other hand, these stu- 
dents might not have much of an incentive to take a test just prior 
to leaving a school, and if they did take such a test, the results 
could be quite misleading. I do not see an easy way of including 
out-migrants in an accountability system other than testing all stu- 
dents at multiple points during the school year — an extremely 
expensive proposition. 

8 Optionally, all students — including non-summer-school stu- 
dents — could be tested in the late spring and early fall. The advan- 
tage of this approach is that it would allow schools to distinguish 
growth in student achievement during the school year from 
growth (or possibly decline) during the summer for all students. It 
would also allow schools to better estimate the benefits of partici- 
pation in summer school. This approach would, of course, raise 
the costs of testing. 

9 In the absence of point-of-entry testing, mobile (in-migrant) 
students must be excluded from value-added or gain indicators 
because the students lack a prior measure of achievement. 



Barton, P E., & Coley, R. J. (1998). Growth in school: Achievement 
gains from the fourth to the eighth grade. Princeton, NJ: Policy 
Information Center, Educational Testing Service. 

Dossey, J. A., Mullis, I. V., Lindquist, M. M., & Chambers, D. L. 
(1988). The mathematics report card: Are we measuring up? 
Princeton, NJ: Educational Testing Service. 

Meyer, R. H. (1996). Value-added indicators of school performance. 
In E. A. Hanushek &c D. W. Jorgenson (Eds.), Improving 
Americas schoob: The role of incentives (pp. 1 97-223). 
Washington, DC: National Academy Press. 



FOR FURTHER READING 

Bryk, A. S., & Raudenbush, S. W. (1989). Quantitative models for 
estimaung teacher and school effectiveness. In R. D. Bock (Ed.), 
Multilevel analysis of educational data (pp. 205-232). San Diego: 
Academic Press. 

Clotfelter, C.T., & Ladd, H. F. (1996). Recognizing and rewarding 
success in public schools. In H. F. Ladd (Ed.), Holding schoob 
accountable (pp. 23-63). Washington, DC: Brookings. 

Hanushek, E. A., Taylor, L. (1990.) Alternative assessments of the 
performance of schools, Journal of Human Resources , 25(2), 
179-201. 

Mandeville, G. K. (1994). The South Carolina experience with 
incentives. In T. A. Downes & W. A. Testa (Eds.), Midwest 
approaches to school reform (pp. 69-97). Proceedings of a 
conference held at the Federal Reserve Bank of Chicago, 
October 26-27. 

Meyer, R. H. (1999). The effects of math and math-related courses 
in high school. In S. E. Mayer, & P E. Peterson (Eds.), Earning 
and learning: How schoob matter (pp. 169-204). Washington, 
DC: Brookings. 

Millman, J. (1997). Grading teachers, grading schoob. Thousand 
Oaks, CA: Corwin. 

Raudenbush, S. W., & Willms, D. J. (1991). Schoob , classrooms, 
and pupib. San Diego: Academic Press. 

Raudenbush, S. W., & Willms, D. J. (1 995). The estimation of 
school effects. Journal of Educational and Behavioral Statistics , 
20(4), 307-336. 

Sanders, W. L., & Horn, S. P (1994). The Tennessee Value-Added 
Assessment System (TVAAS): Mixed model methodology in 
educauonal assessment. Journal of Personnel Evaluation in 
Education , 8 , 299-3 1 1 . 

Willms, D. J., & Raudenbush, S. W. (1989). A longitudinal 

hierarchical linear model for estimaung school effects and their 
stability. Journal of Educational Measurement, 26, 209—232. 





7 



Robert H. Meyer is a Senior Scientist at the Wisconsin Center for 
Education Research at the University ofWisconsin-Madison and a 
Lecturer and Research Associate at the Harris Graduate School of 
Public Policy Studies at the University of Chicago. 

The author would like to thank Andrew Porter ; NISE Director; 
Adam Gamoran, University ofWisconsin-Madison; and Margaret 
Goertz, Consortium for Policy Research in Education , University 
of Pennsylvania, for very helpjul comments and suggestions on this 
Brief Many of the issues discussed in this Brief are considered at 
greater length in Meyer (1996). 

Photos by Susan Lina Ruggles 



NISE Brief Staff 



Director 
Project Manager 
Editor 
Graphic Designer 



Andrew Porter 
Paula White 
Deborah Stewart 
IMDC Graphics 



This Brief was supported by a cooperative agreement between the National 
Science Foundation and the University ofWisconsin-Madison (Cooperative 
Agreement No. RED-9452971). At UW-Madison, the National Institute for 
Science Education is housed in the Wisconsin Center for Education Research 
and is a collaborative effort of the College of Agricultural and Life Sciences, the 
School of Education, the College of Engineering, and the College of Letters 
and Science. The collaborative effort also is joined by the National Center for 
Improving Science Education in Washington, DC. Any opinions, findings or 
conclusions herein are those of the author(s) and do not necessarily reflect the 
views of the supporting agencies. 

No copyright is claimed on the contents of the NISE Brief In reproducing arti- 
cles, please use the following credit: “Reprinted with permission from the NISE 
Brief published by the National Institute for Science Education, 
UW-Madison.” If you reprint, please send a copy of the reprint to the NISE. 

This publication is free on request. NISE Brief are also available electronically 
at our World Wide Web site: www.nise.org 

National Institute for Science Education 
University ofWisconsin-Madison 
1025 W Johnson Street 
Madison, WI 53706 
(608) 263-9250 
FAX: (608) 262-7428 

E-mail: niseinfo@macc.wisc.edu 

Vol. 3, No. 3 June 2000 



Visit us at our World Wide Web site: www.nise.org 



National Institute 
'W for Science Education 



University ofWisconsin-Madison 
1025 W. Johnson Street 
Madison, Wl 53706 



Nonprofit Organization 

U.S. Postage 
PAID 

Madison, Wisconsin 
Permit No. 658 



Mu In II i H lilm lllli iii ill I II i ii m III Mi ill i Ilili mli 1 1 

JOYfl HfiRIHfiRflN 
NCSTL 

1929 KENNY RD 
COLUMBUS OH 43210-1015 




9 




U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 
Educational Resources Information Center (ERIC) 




NOTICE 



Reproduction Basis 




This document is covered by a signed "Reproduction Release 
(Blanket)" form (on file within the ERIC system), encompassing all 
or classes of documents from its source organization and, therefore, 
does not require a "Specific Document" Release form. 




This document is Federally-funded, or carries its own permission to 
reproduce, or is otherwise in the public domain and, therefore, may 
be reproduced by ERIC without a signed Reproduction Release form 
(either "Specific Document" or "Blanket"). 



EFF-089 (3/2000) 





